Skip to content

fix(pickled): correct toolbar scenario, add 3 external legibility scenarios#3595

Closed
caio-pizzol wants to merge 1 commit into
mainfrom
caio-pizzol/pickled-external-suite
Closed

fix(pickled): correct toolbar scenario, add 3 external legibility scenarios#3595
caio-pizzol wants to merge 1 commit into
mainfrom
caio-pizzol/pickled-external-suite

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

Follow-up to #3494. Fixes a scoring bug in the merged toolbar scenario and grows the external suite from 1 to 4 scenarios.

The toolbar scenario shipped with two false-negatives, both confirmed against a real run: expected.excludes banned createHeadlessToolbar even though the prompt asks "what should I avoid?" (so a correct answer naming it as the thing to avoid failed), and it required useSuperDocUI, which is a real export but absent from the docs bundle (so docs-grounded answers failed). Now scored positives-only: SuperDocUIProvider + superdoc/ui/react.

Three new scenarios, each with terms verified present in docs.superdoc.dev/llms-full.txt before locking:

  • Programmatic edits: Document API vs UI (Document API + editor.doc)
  • Enable real-time collaboration (Yjs + modules.collaboration)
  • Is createHeadlessToolbar right for new React UI? (positive check on superdoc/ui/react, no traps)

Two candidates were dropped because the docs bundle contains no crisp deterministic term for them (built-in toolbar customButtons, an export API name) - requiring absent terms would false-negative the docs cells.

Validated before opening: config loads, each scenario's docs cell scores YES on real output, and two sampled paid passes (both interfaces, every toolset) score ~66 with tool-use provenance verified on every web/MCP cell and no false-fires.

One product finding worth a separate task: on web/MCP discovery (no injected docs), agents reliably reach the superdoc/ui/react import path but often not the SuperDocUIProvider component, and at least one cell recommended the Document API for a React toolbar - conflating the mutation surface with the UI surface. That points at a docs/MCP discoverability gap, not a config issue.

No CI wiring or secrets in this PR; runs are manual for now.

@caio-pizzol caio-pizzol requested a review from a team as a code owner June 1, 2026 17:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9741e3165b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread pickled.yml
Comment on lines +147 to +149
expected:
paths:
- "superdoc/ui/react"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require the legacy scenario to assert rejection

For this new scenario, the only scoring condition is the presence of superdoc/ui/react, so an answer that incorrectly says createHeadlessToolbar/activeEditor.commands is the right approach but happens to mention the modern import path will still pass. Since the prompt is specifically meant to detect whether agents reject the legacy recommendation, this can mark the exact failure mode as successful and skew the Pickled results; add a positive assertion that the answer says the legacy approach is not recommended/should be avoided, or otherwise distinguish rejection from endorsement.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more

Re-trigger cubic

@caio-pizzol
Copy link
Copy Markdown
Contributor Author

Closing in favor of #3601, which removes the root pickled.yml. Pickled's config schema changed in the latest CLI (new product/sources/agents/access/questions/checks model; traps removed), so these scenario corrections would land on a schema that no longer loads. The suite will be reintroduced deliberately later, and this pass's learnings fold into that.

@caio-pizzol caio-pizzol closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant